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Abstract 

Gaussian graphical models represent the underlying graph structure of conditional dependence 
between random variables which can be determined using their partial correlation or precision 
matrix. In a high-dimensional setting, the precision matrix is estimated using penalized like¬ 
lihood by adding a penalization term which controls the amount of sparsity in the precision 
matrix and totally characterizes the complexity and structure of the graph. The most com¬ 
monly used penalization term is the LI norm of the precision matrix scaled by the regularization 
parameter which determines the trade-off between sparsity of the graph and fit to the data. 
In this paper we propose several procedures to select the regularization parameter in the esti¬ 
mation of graphical models that focus on recovering reliably the appropriate network structure 
of the graph. We conduct an extensive simulation study to show that the proposed methods 
produce useful results for different network topologies. The approaches are also applied in a 
high-dimensional real case study of gene expression data with the aim to discover the genes 
relevant to colon cancer. Using this data, we find graph structures which are verified to display 
significant biological gene associations. 

Keywords: sparse precision matrix, high dimension, clustering, gene expres¬ 

sion, graphical lasso, hyperparameter estimation 


1 Introduction 


In recent years, the study of undirected graphical models (Lauritzen, 2011) has been the 


focus of attention of many authors. The increasing volume of high-dimensional data in 
different disciplines makes them a useful tool in order to determine conditional dependence 
between random variables. For instance, graphical models have been applied to gene 


expression data sets to hnd biological associations across genes in Dobra et ah (2004) and 


Schafer and Strimmer 

12(1(1'. 

), as well as in other biological networks (^Newman 

2003 


and in social networks (Goldenberg, 2007). In Gaussian graphical models, which are 


often used for finding associations between genes using high throughput genomic data, 
the dependence between the genes is fully characterized by the non-zero elements of the 
precision matrix G (see Section 2.1). 
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However, in a high dimensional framework where the number of variables p is larger 
than the number of observations n, there is not enough information in the data available 
to estimate and hence the underlying conditional dependence (CD) graph. To address 
this problem, alternative estimators have been proposed in the last two decades using 
additional information about D such that the estimated covariance matrix and its inverse 
are of full rank. Typically, three classes of estimators of D have been used: thresholding 
(Bickel and Levina, 2008), shrinkage (Ledoit and Wolf, 2004; Daniels and Kass, 2001[ ) 
and penalized log likelihood (Tibshirani, [1996 ). 

In this paper we consider the latter kind of estimators, the graphical Lasso penalization 
method (defined in Section 2.1) which adds the penalty A||D||i with a tuning parameter 
A in the maximum likelihood. The penalized maximum likelihood optimization problem 
is solved using recursive algorithms, for instance we find that three of the most efficient 


and commonly used ways to solve it are GLasso by Friedman et ah (2007), Neighborhood 


selection by Meinshausen and Biihlmann (2006) and Tuning-Insensitive Graph Estimation 
and Regression by Liu and Wang ( |2012 ). The choice of the tuning parameter A represents 
the trade-off between close fit to the data and sparsity of D, and its selection for estimation 
of the corresponding CD graph structure is the topic of this paper. 

Methods such as Cross Validation (CV), Akaike Information Criterion (AIC) or Bayesian 
Information Criterion (BIC) have been widely used to select tuning parameters when p 
is small. However, they fail once dealing with high-dimensional problems by over-fitting 
the graph structure of D (Liu et ah, 2011; Wasserman and Roeder, 2009). 


Liu et ah (2011) proposed the selection of A by controlling the desirable approximated 


variability in the estimated graphs using a subsampling approach (StARS). This method 
contrasts with the usual variable selection statistics since it only considers the estimated 
CD graph structure. Even though the method is promising and gives an alternative to 
AIC and BIC, it has a major drawback: another tuning parameter is needed in order 
to set the maximum variability across samples which can be unknown a priori in many 
applications. Our simulations show that the default values can lead to overestimation of 


the network size in certain graph topologies. Meinshausen and Biihlman (2010) presented 
a stability selection approach which controls the graph edges false discovery rate. The 
authors estimate D by an average subsampling graphical Lasso method such that the 
effect of the choice of A is very low. However, the trade-off between false positive and 
true positive edges of the selected network by their subsampling approach is worse than 
the one given by a network with the same number of edges using all the data due to 
considering smaller effective sample sizes than the original n for estimation. To the best 
of our knowledge, there is no other relevant approach in the literature that only uses the 
graph structure to select the tuning parameter A in graphical models. 

We have applied the following methods for selecting A popular in statistical literature 
to estimate CD graph structures in microarray data: AIC, BIC and StARS. However, 


2 











































the graphs we have obtained were rather dense and very difficnlt to interpret to a bi¬ 
ologist, namely to extract groups of genes acting together and possibly interacting. In 
the biological literature, the most commonly used approaches to construct gene networks 
are based on clustering. This is informed by the expected presence of distinct strongly 


interconnected clusters in biological networks (Eisen and Spellman, 1998 Yi et ah , 2007). 
This gave us the motivation to hnd A such that the corresponding graph has a clustering 
structure which can be interpreted by a biologist without restricting it to a block diagonal 
structure and hence missing potentially important interactions. 

Our aim is to select the hyperparameter A such that (a) it produces reliable estimates 
of the edges of the graph (b) the corresponding CD graph structure is interpretable in 
terms of network characteristics and (c) works well for networks that arise in biologi¬ 
cal systems. In this paper, we propose several such approaches to selecting A, in the 
framework of a general two-step procedure. The main novelty with respect to classical 
approaches such as AIC or BIC is that we use only the graph structure of the GLasso 
estimator to tune the regularization parameter A. The hrst proposed approach. Path con¬ 
nectivity (PC), uses the average geodesic distance of estimated networks to hnd the graph 
that corresponds to the biggest change of the number of connections and is associated 
with splitting of clusters. The second method. Augmented mean square error (A-MSE), 
similarly to the StARS approach, controls the variability of the estimated networks in 
terms of graph dissimilarity coefficients using subsampling. The main difference from 
StARS is the additional bias term to avoid having a tuning parameter. We consider the 
bias with respect to an initial estimated graph structure which contains a desirable global 
network characteristic. For instance, we use the AGNES hierarchical clustering coeffi¬ 
cient (Kaufman and Rousseeuw, 2009), which is the third proposed method to choose A, 
to select the graph that presents the highest clustering structure. Although clustering 
methods exist in the literature, the novelty here is that we use them to select the penalty 
parameter A in Graphical Lasso estimation. 

We compare performance of the proposed approaches as well as of the StARS algo¬ 
rithm and of the standard AIG and BIG on both simulated and real data. The data 
is a microarray gene expression data set generated by the TGGA Research Network: 
http://cancergenome.nih.gov/, It contains 154 samples for patients with colon tu¬ 
mor and about 18k genes. We are particularly interested in Ending significant complex 
gene interactions reliably and relating the observed associations to pathway databases 
which describe known biochemistry connections between genes. Simulations and real 


data analysis are performed using the free statistical software R (R Gore Team, 2015). 

The rest of the paper is organized as follows. In Section 2 we introduce the tuning 
parameter selection methodology and in Section 3 we give their main algorithmic and 
computational information. In Section 4 we compare the performance of the methods 
using simulated data and then apply them to a gene expression dataset in Section 5, 
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2 Regularization parameter selection 


2.1 Gaussian graphical model 

We assume that the data are iid observations from a Gaussian model: Xj ~ Xp(0,G“^), 
f = 1,... ,n independently, assuming, without a loss of generality, that the mean is zero. 
Conditional dependence is totally characterized by the inverse covariance matrix G, also 
called the precision matrix. Two Gaussian random variables Xj and Xj are said to be 
conditionally independent given all the remaining variables if the coefficient Qij is zero. 
This is often expressed with a graph structure G in which each node represents a random 
variable and there is an edge connecting two different nodes if the correspondent element 
in the inverse covariance matrix is non-zero. 

The corresponding log likelihood function for G is £(G) = logdetG — tr{SQ) where 
S = n~^ If exists (p < n is a necessary condition), the MLE of G is given 

by S~^. However, in a high dimensional framework where the number of variables p is 
larger than the number of observations n, the matrix S is singular and so cannot be 
inverted. 

We make an additional assumption that the CD (conditional dependence) graph is 
sparse, and hence that the precision matrix G is sparse. Ideally, we would like to use 
a penalized likelihood estimator, with the penalty proportional to the number of non¬ 
zero elements in D. However, such optimization problem is non-convex and thus is very 
computationally intensive. In practice, a likelihood estimator with a convex penalty term 
proportional to the £i norm of D, a Graphical Lasso, is commonly used instead: 


^‘‘PML 


= argmax [logdet D — fr(S'f2) — A||G|| 


ij) 


( 1 ) 


where ||D||i = Yl^j=i Is the element-wise £i norm of the matrix G and PML stands 
for penalized maximum likelihood. For small A, the corresponding penalized estimator 
of D tends to be dense and in the extreme (A = 0) we are back to the initial Maximum 


Likelihood problem which may not have unique solution when p/n is large (Pourahmadi 


2011). As we increase A, the matrix becomes more and more sparse until we get a 


diagonal matrix. Therefore, the choice of A has a crucial effect on the estimated CD 
graph structure. 


2.2 General two step procedure to select the tuning parameter 

The £\ penalized maximum likelihood estimator dehned in (1) requires selection of a 
regularization parameter A. If the £i penalization genuinely represented our true prior 
knowledge about D then one of the standard methods such as the maximum marginal 
likelihood or cross validation for the elements of D could be used. However, the G 
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penalty here is used due to its computational convenience, replacing the £o penalty, so 
these methods are not appropriate. It is well known for the problem of estimating sparse 
vectors in high dimensions with the Lasso penalty, that the variable selection part, with 
an appropriate A, is consistent, however, the estimation of the non-zero values usually 


has some bias (Wasserman and Roeder, 2009 Gu et ah, 2013). This can be due to the 
convex relaxation of the desired £o penalty to the computationally efficient £i penalty. 
Therefore, in this paper we propose to employ the methods that use only the variable 
selection part from the GLasso, G^, for tuning the hyperparameter A. 

We propose the following two step procedure for estimating A: 


1. Set ^pml equation (1) for all A G A, A C [0, Xmax], Xmax > 0. 

2. Ghoose A = argmin^ R(A, G'^) 

using risk functions R that are based only on GD graphs G^. This procedure combines 
computational efficiency of the Lasso algorithm with the choice of A that optimizes rele¬ 
vant characteristics of the GD graph such as connectivity, clustering structure, etc. 


2.3 Graph notation and distances 

Before introducing the risk functions, we give some basic definitions and properties of 


networks (Gosta and Rodrigues, 2007 Estrada, 2011) which will be used to select the 


regularization parameter. 

A graph G{V, E) is a set of nodes V, with connections between them, called edges E. 
The graph structure is often represented by a p x p matrix, called adjacency matrix and 
denoted by Ac- In the estimation of graphical models, the off-diagonal elements of Ac 
are determined by the precision matrix (0 if = 0 and 1 otherwise) and the diagonal 
elements are set to zero. Note that graphical models are undirected which means that 
the correspondent Ac is always symmetric. 

The distance between a pair of nodes V) and Vj G G(l/, E) (also known as the geodesic 
distance) defines the shortest number of edges connecting node V) to the node Vj which is 
denoted by If there is no path linking the two nodes, then gij = oo. The correlation 
coefficient aij between two nodes V), Vj G G(l/, E) and the corresponding dissimilarity 
measure dij are given by 


CTij = riij/yjKiKj, with 


dij 1 [^p] 


( 2 ) 


where rjij is the number of neighbors shared by the nodes Vi and Vj and Ki is the degree 
of the node Vi defined as the number of nodes that are directly connected to Vi. 
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2.4 Proposed risk functions 


We propose several risk functions to select A that monitor network characteristics of 
the conditional dependence graphs that can be applicable to genomic data. It has been 
observed (Yi et al., 2007) that molecules in a cell work together in groups, with some - 
usually less strong - interaction between the groups. This motivates our choice of risk 
functions to encourage a clustering structure in the estimated graphs. 


2.4.1 Path connectivity risk function 

To motivate the hrst proposed risk function, we observe the following obvious property 
of the graph that corresponds to the penalized estimator dehned by (1): for small 
A, the likelihood term dominates and the estimator is usually a dense graph with 
closely htting the data, and for large A, the penalty term dominates and the corresponding 
estimate is a very sparse graph with not htting the data well. Thus, for growing values 
of A, there is a decrease in graph complexity, and the aim of the method we propose here 
is to capture the value of A that corresponds to the largest change in the complexity of 
the graph. 

For simplicity, we consider a grid of values of A, A = (Afc)^^ such that Xk — Xk-i = h, 
k = 2,..., M, and the underlying estimated graphs G^ for all A G A. We propose Path 
connectivity (PC) which is a novel approach to hnd A that hnds the biggest change 
in graph complexity between the graphs G^ corresponding to two consecutive values of 
A G A. In this case the measure of graph complexity is calculated by the geodesic distance 
mean statistic 

H{X) = ^ ^gij{X)I{gij{X) < oo), (3) 

where gij{X) are the dissimilarity coefficients for the graph G^. To hnd the largest change 
in H{\), we consider the hrst order diherences of H{\) by Dh{X) = AhH{\), where 
Ah refers to the diherence operator with bandwidth h. The regularization parameter 
selection by PC is given by the A that produces the most rapid relative descent in the 
number of graph connections: 

Ape = argmaxi?pc(Afc) = argmax |T),,(Afe)/Z)fe(Afc) I , (4) 

where Xk is the fc-th ordered element in A and Dh{Xk) is the running average dehned 
as the average of elements Dh{X) with A G {Ai,..., Afc}. The diherence of the geodesic 
distance mean is divided by Dh{Xk) in (4) to favor big jumps for larger Xk (and sparser 
G^) in comparison to the jumps for smaller Xk which correspond to more dense graphs. 

In Figure 1 we illustrate the motivation of using the PC selection of A in simulated 
data (see Section 4 for details). The true CD graph structure dehned by three non- 
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overlapping clusters is plotted in Figure 1(a). We show the geodesic distance mean as 
function of A for graph estimations in Figure 1(d). It shows two big jumps which are 
related to the separation of clusters. The latter one gives the selected graph by PC 
and is due to the separation of two clusters (see Figure 1(b) for the selected Ape = A^ 
and Figure 1(c) for the previous graph structure dehned by Afc_i). This is a generally 
observed behaviour in both simulated and real gene expression datasets. In Figure 1(e) 
we show the density estimates of Ape using 100 iid datasets with n = 100, p = 170 and 
two theoretical graph structures: hubs-based clustered graph as shown in Figure 1(a) and 
non-clustered/random graph structure as shown in Figure 1(f). We can see the clear peak 
around A = 0.33 for the clustered data against a relatively flat empirical distribution for 
the non-clustered data. 



(a) True clustered 
network 


(b) Est. graph with 
A = A^i 


'pc 


(c) Est. graph with 
A = Ape 1 ■ 




(d) Geodesic distance (e) Densities of 

mean A = Ape 


(f) True non-clustered 
network 


Figure 1 . Path connectivity regularization parameter selection using a clustered -true network in (a) 
and solid line in (e)- and a non-clustered -true network in (f) and dashed line in (e)- graph structures. 


2.4.2 A-MSE risk function 

The idea explored in this section is to use a risk function based on network characteristics 
such as dissimilarities of the graph dehned by (2). Ideally, we would like to hnd 
that minimizes 

Rmse{X) = \dij - 4(A)r), (5) 

i>j 
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for some g > 1 where dij are the dissimilarities of the true graph dehned by (2) and dij{\) 
are the dissimilarities of the CD graph estimated by (1) for a given tuning parameter A. 
RmseW depends on the unknown true graph structure of D; in practice, an unbiased 
estimator of RmseW is used, commonly obtained by subsampling (bootstrap, cross 
validation) by comparing estimated values to observations. However, the problem in this 
setting is that direct observations of dij are not available. 

To overcome this problems we propose to use an initial graph estimate G and its 
dissimilarities coefficients [dij] in place of observed data. Thus, we propose to use the 
following choice of A: 


^amse 


aigminR amseW = argmin- dij{X)\‘^, 
AeA AeA ^^ 

i>j 


( 6 ) 


where E indicates the estimation of the expected value using subsampling. The algorithm 
is given in Section 3.2, We find that Xamse can approximate well in our simulated 

data (see Section 5 in Supplementary material). 

For g = 2, this risk function can be written as a sum of the variance terms and the 
sum of the squared differences between the initial and the current estimator (the “bias” 
term); see equation (13) in Section 3.2, Note that the first summand in (13), the variance 
of the estimated distances, gives a stability measure similar to the one proposed in StARS 
(the latter uses the adjacency matrix instead of the dissimilarities). However, we add a 
bias term for the distance estimator which allows us to avoid the selection of the power 


tuning parameter jS that controls the desired variability in the StARS approach (Liu 


et al., 2011). 


The proposed Ramse{X) risk can be applied to other network characteristics. By 
the definition of graph dissimilarities, dij = 1 if nodes i and j are neither directly nor 
indirectly (share neighbor) connected. Defining hij = 0 if aij = 1 — dij = 0 and hij = 1 if 
CTjj > 0, for sparse networks, there are many hij = 0 and only few h^j = 1. Applying the 
Ramse{X) to [hij] instead of [dij], we obtain 


R’lMSEW = '^J2^hij-kj{X)Y = C^, + E G-‘2h,j) = Ch + E[TP{X)-FP{X)] 

i<j (P)6e(A) 


where ^(A) = i < j khij{X) = 0}, FP(A) = Y.i<j^[hi 3 = 0, hj{,X) = 1], TP(A) = 

'n,i<j^[^ij — ^ijW = 1] ^cid Ch is independent of A. . Minimizing RamseW is the 
same as maximizing the TP and FP differences (also known as Youden indices). 

In practice, biologists often use clustering algorithms to discover groups of genes. 
Hence, we propose to use the output of a hierarchical clustering algorithm as an initial 
estimate of the graph to characterize global structure for the dissimilarities [dij]. We have 
investigated several clustering algorithms on real and simulated data, and we have not 






found much difference in the resulting graph estimate. Below we present the algorithm 
based on AGNES clustering method. 


2.4.3 AGNES risk function 


Clustering of features using a dissimilarity measure has been intensively studied in the 
literature. Here we focus on the algorithm AGNES (AGglomerative NESting) which 


is presented in Kaufman and Rousseeuw (2009, chap. 5) and is implemented in the 


R package cluster (Rousseeuw et al., 2013). AGNES hnds clusters iteratively joining 


groups of nodes with the smallest average dissimilarity coefficient. This average is found 
by considering the dissimilarity coefficients between all possible pairs of nodes from two 
different clusters. Moreover, AGNES proposes an agglomerative coefficient (AC) that 
measures the average distance between a node in the graph and its closest cluster of 
nodes. We propose to choose A that maximizes the AC coefficient 


Xac = axgmax RagnesW = argmax AC'(A). (7) 

AeA AeA 

The details of the AGNES algorithm and the dehnition of the coefficient AC can be found 
in Section 3. 

The matrix of dissimilarities D obtained by (2) gives a good representation of the 
complexity of a given graph, so, in addition to being applied as an initial estimate for the 
A-MSE method described above, AGNES can also be used as a method of choosing A. 


2.5 Comparison of the methods 

In Table 12 we give some of the main properties of the 6 risk functions we want to compare 
which are the three proposed methods, as well as StARS (Liu et al., 2011| ), AIC and BIG. 
Likelihood-based risk functions to select A such as AIC and BIG are useful to compromise 
between goodness of £t to the data and model over-fitting. The additional AIC penalty 
(given hj p{p — 1)) is smaller than BIG (given hj p{p — 1) log(n)/2) even for very small 
n. Hence, the selection of A by AIC results in a denser CD graph structure of D than 
by BIG. StARS gives a good alternative to select A when estimating graph structures. It 
transforms the selection of A problem to the choice of the maximum expected variability 
that we allow in the graph. Even though such a choice is more intuitive than the direct 
selection of A, we find it difficult to use without any prior information; our simulations 
show that using the default value of the tuning parameter results in high number of false 
positive edges (see Section 4.4). 

We provide two computationally fast approaches, AGNES and PC, and the slightly 
more computationally challenging A-MSE method due to subsampling. The AGNES se¬ 
lection tends to hnd the most clustered graph possible such that different groups of nodes 
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can be interpreted and analyzed. This is found to be a good choice of A to recover global 
graph structure characteristics when the true precision is block diagonal (See Section 4 
in the supplementary material). The A-MSE selection uses the AGNES estimator as the 
initial graph structure with the aim to improve estimations of local network characteris¬ 
tics. The value of A selected by A-MSE is at least as large as the one given by the initial 
estimator (AGNES), and it is used to stabilize the trade-off between false positive and 
true positive edges in the original estimator (AGNES) when n is small (for details see 
Section 4.4). Moreover, as the sample size increases, the value of A chosen by the A-MSE 
method tends to the original estimator of A (AGNES). We use Path connectivity as the 
initial good choice of A to hnd the most sparse graph that is easy to interpret. Starting 
from the sparsest graph and proceeding to denser graph structures, the PG method mon¬ 
itors the first big change in connectivity of the estimated networks, which is frequently 
associated with cluster agglomerations. 

Table 1. Risk functions main characteristics. 


method 

penalized 

likelihood 

uses network 
characteristics. 

subsampling 

fully 

automatic 

fast 

very sparse 
graph estimates 

PC 


/ 


/ 

/ 

/ 

A-MSE 


/ 

/ 

/ 


/ 

AGNES 


/ 


/ 

/ 


StARS 


/ 

/ 




BIG 

/ 



/ 

/ 

/ 

AIC 

/ 



/ 

/ 



3 Algorithms 

3.1 Path connectivity regularization parameter selection 

The procedure to select A by Path connectivity is detailed in Algorithm 1, It is generally 
fast and straightforward, i.e. does not require any additional tuning. 


Algorithm 1 Path connectivity algorithm 
1: procedure Rpc(A) 

2: Set A = (Afc)^]^ with \j. — \k-i = h, /c = 2,..., M. 

3: for k in 1 until M do; 

4: Estimate the graph using (1) and calculate its geodesic distance matrix 

[gij] as in (2). 

5: Galculate geodesic distance mean H{\k) = m~^ < oo) 

with m = p{p — l)/2. 

6: Galculate Dh{\k) = H{\k) — H{\k-i) and the running average Dh{\k) = l/{M — k — 
7: Return Dh{\k)/Dh{>^k), k = 2,...,M. 
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3.2 A-MSE regularization parameter selection 


For g = 2, the risk function RamseW presented in (6) can be decomposed by the sum 
of the variance and the squared bias, with the corresponding approximation given by 


RamseW — 531e(e|4(a)] 

i>j 


4(a))2 + (e[4(a)]-4(a,,))' 


( 8 ) 


Here E(E[(iij(A)] — djj(A))^ and E[(ijj(A)] —dij{Xac) are estimators of the variance of dij{X) 
and the bias of dij{X) with respect to dij{Xac) using subsampling. The subsampling pro¬ 
cedure to select Xamse IS presented in Algorithm 2, Following Meinshausen and Biihlman 


(2010) we choose the effective sample size B = O.dn since the procedure gets the closest 


to bootstrap. Nevertheless, other effective sizes could be used. For instance, Liu et ah 


(2011) use B = lO-y/n. 


Algorithm 2 Subsampling approach to approximate (13) 

1: procedure RAMSi?(A) 

2: Set A = {Xk)^=i and number of subsampling replicates T. 

3: for t in 1 until T do: 

4: Subsample B C {I : n} and set Xb = {Xj,j G B). 

5: Estimate the graphs &{Xk) for all A*, G A using Xb- 

6: Find dissimilarities of G\Xk) by (iF(Afc) = 1 - Viji^k)/K\{Xk)K^j{Xk). 

7: Estimate the average dij{Xk) over all T iterations. 

8: Return T“i Y^J=i{dij{h) - dij{><k)y for all A*, G A. 


3.3 AGNES regularization parameter selection 

Below is the AGNES iterative clustering algorithm, including the agglomeration coef- 
hcient that is used to select A. The input to the algorithm is a dissimilarity matrix 
D = [dij] = D{X) based on the graph corresponding to the estimator dehned by 
(1). AGNES performs hierarchical clustering by iteratively joining groups of nodes with 
the smallest average dissimilarity coefficient, starting with individual nodes as single clus¬ 
ters and hnishing with a single cluster of all p variables. Let {Gi \ ..., Gp'^) be a partition 
of (1 : p) at iteration f, and let denote a dissimilarity between clusters G^'* and Gm ■ 
We also record the dissimilarity for each node when it merges with another cluster or 
node for the hrst time, denoting it by S*, j = 1,... ,p, and the distance between the 
two clusters merged at the last step into the single cluster. The procedure is detailed in 
Algorithm 3, 

The coefficient AG{X) measures the average distance between a node in the graph 
and its closest cluster of nodes. When the dissimilarities within the clusters are small in 
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Algorithm 3 AGNES clustering algorithm 
1: procedure RagnesW 

2: Initialization: take each node as an individual cluster, i.e. set = {k}, k = 
1,... ,p, and - dissimilarity between nodes k and 

3: At iteration t > 0: 

4: Find pair of clusters (h, k) {h < k) with the smallest dissimilarity, i.e. 

(h, k) = argmin^l*-, 

i<j 


merge them, i.e. set = {Ck^^h'^} remove cluster h: = 0. 

Remaining clusters are unchanged: set = G]*^ for j ^ k, h. 

5: The dissimilarities change to 


rh+i) 

^j,h 


“ ^h,j 


= oo, 


dt+1) 

^k,j 


- 


1 r 


^k,j 


+ S. 


(t) 

j,h 


yj^k,h. 


6 : 

7: 


If |C<‘>| = 1, set St = 4«i if |C-<‘>| = 1, set St = S<‘l 

If the number of non-empty sets (clusters) in the newly formed partition 

is more than 1, then set t = t -|- 1 and go to step 3; otherwise set . 

Return 


AG(A) 



(9) 


comparison to the maximum dissimilarity, then 1 — S*/S^^^ is large for all j and AG(A) 
is consequently high. 

The time and total memory used in the AGNES algorithm increases exponentially 
as p grows. In order to make computations feasible in very high dimensions, we use 


an approximation of the measure by a variable subset selection approach (Kohavi and 


John, 1997). We consider the average AG coefficient with respect to A over several sets 


of variables. We validate the subsets R C {1 : p} of size \V\ using the coefficients of 
variation of the empirical degree distribution (k) dehned by GVy = sdv'(K)/Ey(/s;) with 
Ei/(«:) = 1/|R| K,j and sdy(K) = 1/(|R| - 1) “ Ev(k)) 2 (see Algorithm 4). 

We aim to hnd subset of variables whose number of edges is approximately proportional 
to those in the original matrix. In Section 4 of the supplementary material we illustrate 
how the variable subset approach reduces the computational time in high-dimensional 
simulated datasets. 


4 Simulated data analysis 

In this section we consider simulated data to test the performance of the regularization 
parameter selection methods using graph structures similar to what can be expected in 
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Algorithm 4 Subset selection for AGNES computations 
1: procedure S(A) 

2: Input: variables Vt = {1 p} and their degrees k = {ki, ..., Kp}. 

3: Compute CVy^. 

4: Select randomly m < p variables from the original data to form set Vq C 
5: Add all the nodes Vi in the adjacency matrix which have a path to at least one 
node in Vq. Use Vs = {Vq, Ui}. 

6: Compute CVy,. If \CYyJCVvt — 1| > r go to step 4, otherwise return Vg. 


biological networks. We analyze both the capacity to obtain the true connections and 
the accuracy in recovering network characteristics of the true graph. 


4.1 Graph topologies in biological data 

In real applications, the graph which dehnes causal connections between variables (e.g. 
genes, proteins, etc) is unknown but there is typically some knowledge about what kind 


of network structure can be expected (Newman, 2003). For instance, biological graph 


structures usually present associations in the shape of clusters, meaning that the nodes 
form groups that are more similar to the nodes within the group than to the nodes of 


other groups (Eisen and Spellman, 1998). In addition, network patterns can be dehned 


by the distribution of the variable pk, which denotes the fraction of nodes in the network 
that has degree k. Here we consider two different graph topologies: hubs-based and 
power-law. 

Hubs-based networks are graphs where only few nodes have a much higher degree (or 
connectivity) than the rest. This is a typical case in biological networks where nodes that 


behave as hubs may have different biological functions than the other nodes (Lu et ah 


2007). Power-law networks assume that the variable pk follows a power-law distribution 


Pk = k "/?(«), 

where fc > 1, a is a positive constant and the normalizing function <;{a) is the Riemann 


zeta function. Following Peng et al. (2009), a = 2.3 provides a distribution that is close 


to what is expected in biological networks. 


4.2 Simulated data 

We generate data from multivariate normal distributions with zero mean vector and 
several almost-block diagonal precision matrices, where each block (or cluster) has a 
hubs-based or power-law underlying graph structure (dehned in Section 4.1) and there 
are some extra random connections between blocks. The non-zero partial correlation 
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coefficients are simulated by 




Unif(0.5,0.9) if Eij = 1 with prob= 0.5 ; 

Unif(—0.5, —0.9) if Eij = 1 with prob= 0.5 ; 
0 


( 10 ) 


if Eij = 0. 


Then, we regularize which may not be positive dehnite, by + 51, with 

5 such that the condition number of is less than the number of nodes, so obtaining 


a positive dehnite matrix (Cai et ah, 2011). Note that such precision matrices are non¬ 


singular, sparse and with the non-zero elements bounded away from 0. 

We consider precision matrices with p = 50, 170, 290 and 500 and sample sizes n = 50, 
100, 200, 500. Different number of hubs, degree of hubs, and sparsity levels are considered 
in 60 simulated datasets for each combination of p and n. Full specihcation of simulated 
data is given in the supplementary material. 


We use the R package huge (Zhao et ah, 2012) to estimate CD graph structures 


by GLasso and Neighborhood selection (MB). The GLasso gives the estimated partial 
correlation matrix but MB only provides the estimated adjacency matrix. In order to 
compare the proposed methods to both AIC and BIG, here we only present the results for 
the GLasso procedure. Nevertheless, the performance of the methods using MB estimates 
is shown in the supplementary material. We take a sequence of 70 equidistant points for 
A going from 0.20 to 0.66 for small n and a sequence going from 0.03 to 0.40 for large 
n (the graphs almost have no change for A’s smaller than the lower limit with all nodes 
connected as well as higher than the upper limit with no edges across nodes). Then we 
select A by six different approaches: 1) PC; 2) A-MSE; 3) AGNES; 4) StARS; 5) BIG 
and 6) AIC. StARS (with f3 = 0.05) produces the lowest A for almost all the simulated 
datasets followed closely by AIC. The BIG results are strongly dependent on the sample 
size; the methods selects large tuning parameters for small n and low tuning parameters 
for large n in comparison to A-MSE. The AGNES selections are always larger than A- 
MSE but they get close when n increases. The PC A selections do not vary much for 
different n and p scenarios and produce similar magnitudes to A’s selected by A-MSE. 

We assess the performance of the A selection approaches for GLasso estimates using 
two different measures: squared errors in both the partial correlation matrix and the dis¬ 
similarity matrix dehned in (2) and graph recovery with a false positive and true positive 
analysis. The simulated data analysis is completed in the supplementary material where 
we compare for both GLasso and MB the selected graph structures and the true networks 
given global network characteristics as clustering, connectivity and graph topology. 
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4.3 Mean square errors 


To measure performance of the methods we use the ranks of the average mean square 
errors (MSE) of the partial correlation matrix hi (Table 10) as well as of the dissimilarity 
matrix D (Table 5). This second rate gives a good reference to determine if the estimated 
graph captures the true local structure. The lowest rank (rank = 1) is assigned to 
the lowest MSE and the largest rank (rank = 6) is for the largest MSE out of the six 
approaches. In the tables, we show the errors for the GLasso method. 

Even though StARS and AIC estimate G well, they produce larger errors than AGNES, 
A-MSE, PG and BIG when minimizing the MSE of the dissimilarity matrix. Particularly, 
A-MSE tends to be the best selection for this loss function for large n. We hnd that BIG 


does well for small n, contrarily of what is obtained in Liu et ah (2011), but tends to be 
unreliable for larger sample sizes. AGNES gives good ranks for the hubs-based scenarios, 
particularly when n is large, and PG is almost always among the three best methods. 


Table 2. Average ranks for the mean square errors of the precision matrix. 


Hubs-based Power law 


n 

50 

100 

200 

500 50 

100 

200 

500 

AGNES 

3.05 

3.55 

dimension p=50 

4.06 4.40 3.12 

3.73 

4.40 

4.71 

A-MSE 

4.33 

4.90 

5.22 

5.38 4.92 

5.47 

5.67 

5.78 

PC 

5.23 

5.80 

5.58 

5.15 4.58 

5.13 

4.85 

4.49 

StARS 

1.27 

1.49 

1.18 

1.28 1.17 

1.43 

1.04 

1.07 

BIG 

5.38 

3.73 

3.14 

3.06 5.33 

3.66 

3.08 

3.02 

AIC 

1.73 

1.52 

1.82 

1.73 1.90 

1.58 

1.96 

1.92 

AGNES 

2.81 

3.30 

dimension p=170 

4.07 4.63 2.69 

3.28 

3.91 

4.42 

A-MSE 

4.13 

4.92 

5.17 

5.52 4.79 

5.38 

5.43 

5.63 

PC 

5.31 

5.91 

5.61 

4.03 4.96 

5.55 

5.46 

4.93 

StARS 

1.00 

1.22 

1.00 

1.00 1.00 

1.00 

1.00 

1.05 

BIG 

5.56 

3.87 

3.14 

3.48 5.23 

3.78 

3.20 

3.00 

AIC 

2.19 

1.78 

2.02 

2.35 2.33 

2.00 

2.00 

1.97 

AGNES 

2.54 

3.02 

dimension p=290 

3.97 4.38 2.33 

3.07 

3.87 

4.20 

A-MSE 

4.17 

4.83 

5.08 

5.30 4.83 

5.39 

5.46 

5.46 

PC 

5.12 

5.91 

5.78 

4.83 4.68 

5.57 

5.53 

5.33 

StARS 

1.00 

1.01 

1.00 

1.00 1.00 

1.00 

1.00 

1.02 

BIG 

5.71 

4.23 

3.13 

3.29 5.47 

3.98 

3.14 

3.02 

AIC 

2.46 

1.99 

2.03 

2.20 2.68 

2.00 

2.00 

1.98 

AGNES 

2.13 

3.00 

dimension p=500 

3.92 4.30 2.11 

3.00 

3.62 

4.11 

A-MSE 

4.28 

4.78 

5.13 

5.35 4.81 

5.25 

5.27 

5.47 

PC 

4.94 

5.97 

5.60 

4.85 4.63 

5.67 

5.73 

5.39 

StARS 

1.00 

1.01 

1.00 

1.00 1.00 

1.00 

1.00 

1.00 

BIG 

5.78 

4.25 

3.31 

3.32 5.55 

4.08 

3.38 

3.03 

AIC 

2.88 

2.00 

2.05 

2.18 2.90 

2.00 

2.00 

2.00 


4.4 Graph recovery 

In order to quantify how well the algorithms recover the non-zero elements in G we 
compare the true discovery rate (TDR), which can be defined by TDR = TP/{TP + FP) 
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Table 3. Average ranks for the mean square errors of the dissimilarity matrix. 


Hubs-based Power law 


n 

50 

100 

200 

500 50 

100 

200 

500 

AGNES 

2.88 

2.70 

dimension p=50 

2.23 2.09 3.60 

3.02 

2.38 

2.09 

A-MSE 

2.83 

2.47 

1.65 

1.44 2.12 

1.65 

1.20 

1.41 

PC 

3.52 

3.67 

2.75 

2.68 2.42 

2.22 

2.53 

2.52 

StARS 

4.16 

4.58 

5.81 

5.72 5.70 

5.58 

5.97 

5.96 

BIG 

3.83 

3.05 

3.41 

3.79 2.13 

3.12 

3.88 

3.98 

AIC 

3.77 

4.54 

5.16 

5.28 5.02 

5.42 

5.03 

5.04 

AGNES 

3.52 

2.98 

dimension p=170 

2.12 1.73 4.31 

3.68 

3.06 

2.32 

A-MSE 

2.62 

2.04 

1.65 

1.45 2.14 

1.58 

1.45 

1.40 

PC 

2.46 

2.49 

3.11 

3.83 2.12 

1.62 

1.73 

2.32 

StARS 

6.00 

5.78 

6.00 

6.00 6.00 

6.00 

6.00 

6.00 

BIG 

2.14 

2.52 

3.14 

3.34 1.80 

3.12 

3.77 

3.97 

AIC 

4.26 

5.18 

4.98 

4.65 4.62 

5.00 

5.00 

5.00 

AGNES 

4.16 

3.02 

dimension p=290 

1.83 1.77 4.67 

3.92 

3.08 

2.57 

A-MSE 

2.42 

1.97 

1.85 

1.48 2.17 

1.57 

1.53 

1.48 

PC 

2.11 

2.89 

3.50 

3.66 2.30 

1.50 

1.54 

1.98 

StARS 

6.00 

5.99 

6.00 

6.00 6.00 

6.00 

6.00 

6.00 

BIG 

2.01 

2.22 

2.85 

3.29 1.55 

3.01 

3.84 

3.98 

AIC 

4.31 

4.91 

4.97 

4.80 4.32 

5.00 

5.00 

5.00 

AGNES 

4.83 

3.25 

dimension p=500 

2.06 1.60 4.89 

4.00 

3.38 

2.56 

A-MSE 

2.51 

1.80 

1.94 

2.00 2.09 

1.72 

1.33 

1.42 

PC 

2.04 

3.12 

3.69 

3.83 2.32 

1.48 

1.68 

2.06 

StARS 

6.00 

6.00 

6.00 

6.00 6.00 

6.00 

6.00 

6.00 

BIG 

1.51 

1.95 

2.40 

2.78 1.60 

2.80 

3.61 

3.97 

AIC 

4.11 

4.88 

4.92 

4.79 4.10 

5.00 

5.00 

5.00 


with 

TP = 7^ 0 and ilij ^ 0), FP = ^ 0 and ilij = 0), 

i<j i<j 


for each of the estimated networks. In Figure 2, we show the average TDR in the 60 
simulations for all considered combinations of n and p. The TDR increases with n for 
AGNES, A-MSE and PC whereas for AIC and BIG it goes down. In this analysis we can 
see the limitations of the BIG method whose main goal is not the graph recovery of D. 
BIG passes from selecting very sparse graphs with more TP than FP when n is small to 
selecting much denser graphs with many more FP than TP when n is large. 


4.5 Summary 

In our simulations A-MSE turned out to be the best approach to recover the CD graph 
structure as can be seen in Table 5, BIG is also competitive when n is small, but it is not 
reliable when analyzing larger sample sizes. PC is computationally the fastest method 
and only does slightly worse than A-MSE in Table 5, Moreover, it generally obtains 
simple graph structures which result in comprehensible connectivity interpretations. The 
AGNES procedure is usually over-performed by the augmented version A-MSE for small 
n. Eor large n, AGNES and A-MSE have similar A selections with AGNES being sig- 
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Figure 2. True discovery rate for all A selection approaches and all combinations of p and n. The 
x-axis scale is n : log(n) 


nificantly faster than A-MSE. AIC and StARS (nsing its defanlt valnes) prodnce dense 
graph estimations and achieve the best resnlts when minimizing the mean sqnare error of 
Nevertheless, they fail to obtain interpretable network strnctnres dne to poor graph 
recovery. 


5 Application to colon cancer gene expression data 


We apply the methods in a real case stndy. A gene expression data set that can be 
obtained from the TCGA platform at https: //tcga-data.nci .nih.gov/tcga/, A total 
of 154 patients are examined, the gene expression prohling for 17,617 genes is obtained 
and normalized in each one of them for a colorectal tnmor sample. 

A redaction on the variable space is applied so that we only keep the most highly 
correlated genes. We use a hlter for the gene’s average square correlation with threshold 
equal to 0.04. Moreover, we add the non-hltered genes which have at least one correlation 
coefficient with the hltered genes larger than 0.5. This means a reduction to the 55% 
of the genes with a total of 9,723 genes left to analyze. We estimate CD graphs via the 


Neighborhood selection algorithm of Meinshausen and Biihlmann (2006). We compute 90 


different graphs given an equidistant sequence of A’s between 0.35 and 0.80. Values of A 
lower than 0.35 produce almost-fully connected graphs and values above 0.80 produce zero 
edges in the graph. We use the PC and A-MSE approaches to select one particular graph 
with Ape = 0.69 and \amse = 0.55. The graphical representation of the two underlying 
networks is presented in Figure 3, The graph by PC, with 4, 819 edges, shows a simpler 
structure compared to A-MSE, with 19,986 edges. 

We separate the graphs in different clusters by applying a Partitioning Around Medoids 
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(b) A-MSE selected graph. 


Figure 3. Best graphs by PC and A-MSE by using the estimated adjacency matrices by MB in the 
gene expression data. 


(Reynolds et al., 2006) on the shortest distance matrix. We choose the number of clusters 


manually by considering the largest rate of change in the within-subject and between- 
subject variation such that the PC graph structure contains 15 clusters and the A-MSE 
contains 18 clusters. 

To assess which biological processes may be linked to the clusters, we download 1,320 
gene sets from the MSig database (http://www.broadinstitute.org/gsea/msigdb/index.jsp), 
which represent canonical pathways compiled from two sources: KeGG (http://www. 
genome.jp/kegg/pathway.html)and Reactome(http://www.reactome.org/). For each 
pathway we test for a signihcant over-representation in a cluster by using Fisher’s exact 
test applied to the 2 x 2-table dehned by pathway and cluster membership with a Bonfer- 
roni correction for multiple testing. Note that we use the reduced selection of 9,723 genes 
here as ’’background”, i.e. the analysis corrects for any over-representation of a pathway 
in that selection. 

In the PG selected graph, 6 out of 15 clusters have at least one signihcant pathway 
list (at 0.01 signihcant level) and a total of 160 signihcant pathway lists. Moreover, in the 
A-MSE graph, 7 out of 18 have also signihcant pathways and a total of 122 signihcant 
pathway lists. Among the signihcant lists, PLKl, NEAT, DNA replication or adaptive 
immune system are pathways associated with tumor cells. 

6 Discussion 


In this paper we study the problem of choosing the regularization parameter A for Gaus¬ 
sian graphical models in high dimensional data assuming we have high level knowledge 
about the nature of the graph structures, namely strong clustering in the case of gene 


expression data (e.g. Eisen and Spellman, 1998). The methods we introduce here take 
this assumption into account by selecting A so that risk functions measuring the degree 
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of clustering (AGNES, A-MSE) or connectivity (PC) are optimized. We aim to select 
the sparsest graph such that the real cluster structure is maintained and at the same 
time it contains a good tradeoff between true and false positive edges. The proposed 
approaches to select the regularization parameter provide competitive results at a rel¬ 
atively high computational speed. They present more reliable results than the StARS 
approach which tends to overestimate the network size. The StARS method accounts 


for the stability of the estimated graphs and has been proven to work well in Liu et ah 


(2011). It depends, however, on another parameter which controls the maximum amount 
of variability in the graph. There is no straightforward choice for this parameter and our 
simulation study shows that using the default value of 0.05 StARS yields uninformative 
networks with a majority of edges being false positives. 

The Path connectivity approach introduced here provides a good compromise between 
estimating the structure well and the number false positive edges. The main characteristic 
of this approach is that it relies on the shortest distance between all pairs of nodes. 
Interestingly, this quantity tends to show a clear changepoint when studied as a function 
of A, at which the structure of the graph changes radically. It typically produces very 
informative graphs in all the tested simulated datasets and gives competitive results for 
the mean square error between dissimilarity matrices as discussed in Section 4.3, In the 
gene expression data set it also provides us with a clearly structured informative graph. 
PC gives an excellent hrst choice of A if we want to hnd an easily interpretable graph. 

The A-MSE, with initial graph structure given by the AGNES selected graph, is the 
best of all the approaches in terms of minimizing the MSE between the true distances 
and the estimated ones in the simulated data. Also, Xamse is always smaller than Xac 
leading to less complex graphs than the ones estimated by AGNES. This is a desirable 
property as we assume only a small proportion of non-zero elements in G and thus with 
increasing graph density the number of false positive edges grows much faster than the 
number of true positives. However, if the aim is to have fewer false negatives, that is, 
that as many as possible true edges are included at the expense of a higher number of 
false positives, then algorithms like AGNES and StARS are more appropriate. 

The analysis of the gene expression data underlines some interesting results. The 
obtained graphs present a cluster-based structure as we can see in Figure 3, Our new 
approach of choosing a regularization parameter. Path connectivity, leads to a sparse and 
clustered network that is easy to interpret. Closer investigation of the results shows that 
the clusters overlap signihcantly with a number of pre-dehned gene sets and regulatory 
pathways which indicates that our assumption of a sparse clustered structure leads us to 
biologically meaningful results. 

In conclusion, we hnd that approaches such as PC, A-MSE and AGNES, which 
use network characteristics for parameter selection, can be benehcial in estimating CD 
graph structures (sparse partial correlation matrices) for high-dimensional biological data. 
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While maintaining good statistical properties in terms of false discovery rates and mean 
square error, the resulting graphs tend to be easier to interpret from a biological perspec¬ 
tive and thus are more useful in applications compared to parameter selection methods 
based on penalized log likelihood such as AIC or BIC. 
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7 Supplementary Material 

7.1 Simulated data: graph information 

We generate data from a multivariate normal distribution using the next characteristics 
for the data: 

1. 4 sample sizes: n = 50, n = 100, n = 200, n = 500. 

2. 4 dimension sizes: p = 50, p = 170, p = 290, p = 500. 

3. Number of clusters (and variables per cluster) for each p setting: 1 (50), 3 (70, 60, 
40), 5 (70, 100, 40, 50, 30), 7 (100, 100, 80, 60, 60, 70, 30). 

4. Hubs-based and Power-law models to generate data. 

5. Probability for a variable to be a hub: 0.015 (only for hubs-based models). 

6. Degree of hub nodes generated by Uniform(5, h) where b is one third of the number 
of variables per cluster. 

7. Probability for presence of all remaining edges in hubs-based models: generated by 
Uniform(0.005,0.03). 

8. Probability for presence of edges in between clusters: generated by an Uniform(0, 0.1). 

9. Parameter a in power-law models: 2.3. 
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7.2 Simulated data: some of the generated graphs 

In Figure 4 we illustrate some of the simulated graph structures for all p and the two 
models (hubs-based and power-law). 



(a) p=50, hubs-based 


(b) p=50, power-law 



(c) p=170, hubs-based 


(d) p=170, power-law 




(e) p=290, hubs-based 


(f) p=290, power-law 



.. V.? . ^ 

... «•..* •: -r.v *• • •• •* •• 

*"'..'V 


(g) p=500, hubs-based (h) p=500, power-law 

Figure 4. Graph structure examples used in simulation data. 


7.3 Simulated data: regularization parameter selection 

In Table 4 and Table 5 we give the average ranks for the A selection magnitudes using 
the methods AGNES, A-MSE, PG, StARS, AIG (only GLasso) and BIG (only GLasso). 
The lowest rank (rank = 1) is assigned to the lowest choice of A and the largest rank 
(rank = 6 (GLasso), rank = 4 (MB)) is for the largest A out of all approaches. 
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Table 4. Average ranks for the A magnitude -GLasso estimates. 


n 


Hubs- 

based 




Power law 


50 

100 

200 

500 


50 

100 

200 

500 




dimension p; 

-50 





AGNES 

3.05 

3.55 

4.06 

4.40 


3.12 

3.73 

4.40 

4.71 

A-MSE 

4.33 

4.90 

5.22 

5.38 


4.92 

5.47 

5.67 

5.78 

PC 

5.23 

5.80 

5.58 

5.15 


4.58 

5.13 

4.85 

4.49 

StARS 

1.27 

1.49 

1.18 

1.28 


1.17 

1.43 

1.02 

1.04 

BIG 

5.38 

3.73 

3.14 

3.06 


5.33 

3.66 

3.08 

3.02 

AIC 

1.73 

1.52 

1.82 

1.73 


1.90 

1.58 

1.98 

1.96 




dimension p= 

M70 





AGNES 

2.81 

3.30 

4.07 

4.63 


2.69 

3.28 

3.91 

4.42 

A-MSE 

4.13 

4.92 

5.17 

5.52 


4.79 

5.38 

5.43 

5.63 

PC 

5.31 

5.91 

5.61 

4.03 


4.96 

5.55 

5.46 

4.93 

StARS 

1.00 

1.22 

1.00 

1.00 


1.00 

1.00 

1.00 

1.00 

BIG 

5.56 

3.87 

3.14 

3.48 


5.23 

3.78 

3.20 

3.02 

AIC 

2.19 

1.78 

2.02 

2.35 


2.33 

2.00 

2.00 

2.00 




dimension p= 

-290 





AGNES 

2.54 

3.02 

3.97 

4.38 


2.33 

3.07 

3.87 

4.20 

A-MSE 

4.17 

4.83 

5.08 

5.30 


4.83 

5.39 

5.46 

5.46 

PC 

5.12 

5.91 

5.78 

4.83 


4.68 

5.57 

5.53 

5.33 

StARS 

1.00 

1.01 

1.00 

1.00 


1.00 

1.00 

1.00 

1.00 

BIG 

5.71 

4.23 

3.13 

3.29 


5.47 

3.98 

3.14 

3.02 

AIC 

2.46 

1.99 

2.03 

2.20 


2.68 

2.00 

2.00 

2.00 




dimension p= 

-500 





AGNES 

2.13 

3.00 

3.92 

4.25 


2.11 

3.00 

3.62 

4.11 

A-MSE 

4.28 

4.78 

5.13 

5.36 


4.81 

5.25 

5.27 

5.47 

PC 

4.94 

5.97 

5.60 

5.72 


4.63 

5.67 

5.73 

5.39 

StARS 

1.00 

1.00 

1.00 

1.00 


1.00 

1.00 

1.00 

1.00 

BIG 

5.78 

4.25 

3.31 

3.89 


5.55 

4.08 

3.38 

3.03 

AIC 

2.88 

2.00 

2.05 

2.00 


2.90 

2.00 

2.00 

2.00 


Table 5. Average ranks for the A magnitude -MB estimates. 




Hubs- 

-based 


Power 

law 


n 

50 

100 

200 500 

50 

100 

200 

500 

AGNES 

2.04 

2.08 

dimension p=50 
2.18 2.15 

2.10 

2.33 

2.38 

2.78 

A-MSE 

3.20 

3.22 

3.50 3.45 

3.64 

3.72 

3.69 

3.89 

PC 

3.76 

3.71 

3.32 3.40 

3.26 

2.96 

2.93 

2.33 

StARS 

1.00 

1.00 

1.00 1.00 

1.00 

1.00 

1.00 

1.00 

AGNES 

2.00 

2.00 

dimension p=170 
2.02 2.13 

2.01 

2.02 

2.17 

2.42 

A-MSE 

3.27 

3.05 

3.10 3.26 

3.62 

3.52 

3.64 

3.74 

PC 

3.73 

3.95 

3.88 3.61 

3.38 

3.47 

3.18 

2.83 

StARS 

1.00 

1.00 

1.00 1.00 

1.00 

1.00 

1.00 

1.00 

AGNES 

2.00 

2.0 

dimension p=290 
2.00 2.05 

2.00 

2.00 

2.04 

2.15 

A-MSE 

3.42 

3.1 

3.12 3.26 

3.73 

3.69 

3.61 

3.63 

PC 

3.58 

3.9 

3.88 3.69 

3.27 

3.31 

3.35 

3.22 

StARS 

1.00 

1.00 

1.00 1.00 

1.00 

1.00 

1.00 

1.00 

AGNES 

2.00 

2.00 

dimension p=500 
2.00 2.07 

2.00 

2.00 

2.00 

2.06 

A-MSE 

3.55 

3.09 

3.04 3.21 

3.76 

3.62 

3.54 

3.63 

PC 

3.45 

3.91 

3.96 3.73 

3.24 

3.38 

3.46 

3.31 

StARS 

1.00 

1.00 

1.00 1.00 

1.00 

1.00 

1.00 

1.00 


7.4 Simulated data: MSE ranks for network statistics 

We compute four basic global network statistics to compare the performance of the meth¬ 
ods in terms of structure similarities with the theoretical graphs: 

• The harmonic mean of the geodesic distances in a graph is given by 

• The AGNES coefficient is defined in Section 3.3 in the article and aims to measure 
the clustering structure of the network 

• The Estrada index quantihes the complexity of the network in terms of subgraphs 
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and can be expressed for regular networks by 

EE{\) = ^exp(7j(A)), (12) 

i=i 

where 71 (A),, 7 p(A) are the eigenvalues of Aq. 

• The square average dissimilarity matrix is given by 

AD{\) = 2/[p(p - 1)] ^ 4(A). 

i<j 

Moreover, we also use the average MSE of the dissimilarity matrix and the degree dis¬ 
tribution of the estimated networks. From Table 3-7 and Table 8-13 we present the loss 
function ranks for the square differences between the estimated graph structures and the 
true network using GLasso estimates and MB estimates respectively. The lowest rank 
(rank = 1 ) is assigned to the lowest square error and the largest rank (rank = 6 (GLasso), 
rank = 4 (MB)) is for the largest square error out of all approaches. 


Table 6. Average ranks for the mean square errors of the harmonic mean -GLasso 
estimates. 




Hubs- 

•based 



Power 

law 


n 

50 

100 

200 

500 

50 

100 

200 

500 

AGNES 

2.07 

2.47 

dimension p=50 
2.23 2.16 

2.23 

2.77 

2.12 

2.11 

A-MSE 

2.48 

2.17 

1.62 

1.39 

3.10 

2.02 

1.47 

1.32 

PC 

4.47 

4.72 

3.88 

3.40 

3.09 

2.57 

3.27 

2.71 

StARS 

3.88 

4.54 

5.34 

5.45 

4.58 

5.31 

5.74 

5.92 

BIG 

4.68 

2.60 

3.24 

3.59 

4.11 

3.19 

3.62 

3.92 

AIC 

3.42 

4.51 

4.69 

5.01 

3.88 

5.15 

4.79 

5.01 

AGNES 

1.36 

2.10 

dimension p=170 
1.87 1.87 

1.89 

2.37 

2.19 

1.98 

A-MSE 

3.62 

2.42 

1.57 

1.27 

3.86 

3.15 

2.47 

1.55 

PC 

5.28 

5.59 

5.19 

4.71 

4.33 

4.07 

3.39 

3.17 

StARS 

3.25 

4.60 

5.23 

5.67 

3.80 

5.10 

5.47 

5.80 

BIG 

5.49 

2.25 

2.94 

3.17 

4.78 

2.27 

3.07 

3.72 

AIC 

2.01 

4.03 

4.20 

4.32 

2.34 

4.05 

4.42 

4.78 

AGNES 

1.39 

1.82 

dimension p=290 
1.65 1.78 

1.82 

2.23 

1.95 

1.93 

A-MSE 

4.03 

3.27 

2.03 

1.42 

4.35 

4.14 

3.12 

1.69 

PC 

5.12 

5.59 

5.62 

4.88 

4.27 

4.68 

3.97 

3.52 

StARS 

2.93 

4.41 

5.07 

5.50 

3.42 

4.48 

5.12 

5.67 

BIG 

5.71 

2.53 

2.62 

3.14 

5.28 

2.06 

2.74 

3.53 

AIC 

1.81 

3.38 

4.02 

4.28 

1.87 

3.40 

4.10 

4.65 

AGNES 

1.32 

1.27 

dimension p=500 
1.66 1.65 

1.57 

1.85 

1.89 

1.72 

A-MSE 

4.28 

4.20 

2.78 

1.88 

4.76 

4.40 

3.18 

2.14 

PC 

4.94 

5.97 

5.47 

5.17 

4.53 

5.42 

4.88 

4.06 

StARS 

2.48 

3.75 

4.96 

5.35 

2.92 

4.12 

4.93 

5.39 

BIG 

5.78 

3.18 

2.25 

2.82 

5.55 

2.17 

2.24 

3.31 

AIC 

2.21 

2.63 

3.88 

4.13 

1.68 

3.05 

3.87 

4.39 
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Table 7. Average ranks for the mean sqnare errors of the AGNES coefficient -GLasso 
estimates. 




Hubs- 

•based 



Power 

law 


n 

50 

100 

200 

500 

50 

100 

200 

500 

AGNES 

2.77 

3.03 

dimension p=50 
2.52 2.31 

2.35 

2.58 

2.18 

2.12 

A-MSE 

2.23 

2.22 

1.90 

1.68 

2.82 

2.48 

2.07 

1.74 

PC 

2.75 

2.03 

1.83 

2.23 

2.41 

2.13 

2.15 

2.17 

StARS 

5.46 

5.41 

5.33 

5.37 

5.62 

5.49 

5.33 

5.24 

BIG 

2.60 

2.93 

3.98 

4.03 

2.98 

2.98 

3.62 

3.99 

AIC 

5.19 

5.38 

5.44 

5.39 

4.83 

5.33 

5.66 

5.72 

AGNES 

2.98 

3.30 

dimension p=170 
2.65 2.13 

2.33 

2.80 

2.27 

2.27 

A-MSE 

1.95 

1.94 

1.70 

1.50 

2.83 

2.45 

2.33 

1.68 

PC 

2.52 

1.96 

1.84 

3.38 

3.14 

2.37 

1.92 

2.22 

StARS 

5.98 

5.78 

4.78 

3.83 

6.00 

6.00 

4.78 

4.45 

BIG 

3.26 

2.80 

4.03 

4.16 

3.50 

2.38 

3.68 

4.38 

AIC 

4.31 

5.22 

6.00 

6.00 

3.21 

5.00 

6.00 

6.00 

AGNES 

3.08 

3.58 

dimension p=290 
2.65 2.13 

2.42 

2.78 

2.57 

2.52 

A-MSE 

2.13 

2.17 

1.93 

1.95 

3.23 

2.73 

1.88 

1.89 

PC 

2.86 

1.84 

2.35 

2.88 

3.00 

2.63 

2.35 

2.04 

StARS 

6.00 

5.99 

4.27 

3.50 

6.00 

6.00 

4.48 

3.87 

BIG 

3.56 

2.42 

3.80 

4.54 

3.88 

1.86 

3.73 

4.68 

AIC 

3.38 

5.01 

6.00 

6.00 

2.47 

5.00 

6.00 

6.00 

AGNES 

2.83 

3.53 

dimension p=500 
2.82 2.37 

2.33 

3.03 

3.29 

2.67 

A-MSE 

2.69 

2.08 

2.04 

2.02 

3.64 

2.33 

2.13 

2.03 

PC 

3.44 

2.13 

2.42 

3.37 

3.18 

2.53 

2.15 

2.28 

StARS 

6.00 

6.00 

4.02 

3.60 

6.00 

6.00 

3.65 

3.22 

BIG 

3.96 

2.30 

3.73 

3.80 

4.20 

2.10 

3.77 

4.81 

AIC 

2.08 

4.95 

5.97 

5.85 

1.65 

5.00 

6.00 

6.00 


Table 8. Average ranks for the mean sqnare errors of the Estrada index -GLasso 
estimates. 




Hubs- 

•based 



Power 

law 


n 

50 

100 

200 

500 

50 

100 

200 

500 

AGNES 

3.70 

3.43 

dimension p=50 
2.96 2.56 

3.82 

3.28 

2.58 

2.29 

A-MSE 

2.25 

2.02 

1.77 

1.66 

2.10 

1.52 

1.33 

1.21 

PC 

2.10 

1.35 

1.45 

1.87 

2.31 

1.92 

2.22 

2.52 

StARS 

5.72 

5.51 

5.83 

5.72 

5.83 

5.58 

5.97 

5.96 

BIG 

2.02 

3.22 

3.83 

3.92 

1.88 

3.29 

3.87 

3.98 

AIC 

5.21 

5.47 

5.17 

5.28 

5.07 

5.42 

5.03 

5.04 

AGNES 

4.03 

3.68 

dimension p=170 
2.93 2.37 

4.31 

3.73 

3.09 

2.58 

A-MSE 

2.48 

2.09 

1.83 

1.57 

2.14 

1.63 

1.60 

1.40 

PC 

1.99 

1.14 

1.41 

2.96 

2.04 

1.43 

1.52 

2.13 

StARS 

6.00 

5.78 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

BIG 

1.82 

3.08 

3.84 

3.46 

1.88 

3.20 

3.78 

3.88 

AIC 

4.67 

5.22 

4.98 

4.65 

4.62 

5.00 

5.00 

5.00 

AGNES 

3.84 

3.83 

dimension p=290 
2.97 2.57 

4.65 

3.93 

3.13 

2.88 

A-MSE 

2.65 

2.12 

2.03 

1.87 

2.15 

1.61 

1.61 

1.69 

PC 

2.39 

1.44 

1.25 

2.24 

2.25 

1.43 

1.47 

1.71 

StARS 

6.00 

5.99 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

BIG 

2.19 

2.62 

3.78 

3.52 

1.70 

3.04 

3.79 

3.72 

AIC 

3.92 

5.01 

4.97 

4.80 

4.25 

4.98 

5.00 

5.00 

AGNES 

4.42 

3.90 

dimension p=500 
3.14 2.50 

4.84 

3.90 

3.29 

2.83 

A-MSE 

2.67 

2.15 

1.98 

1.69 

2.19 

1.88 

1.93 

1.75 

PC 

2.29 

1.37 

1.42 

1.47 

2.37 

1.47 

1.50 

1.94 

StARS 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

BIG 

2.01 

2.72 

3.51 

4.71 

1.57 

2.98 

3.36 

3.47 

AIC 

3.61 

4.87 

4.95 

5.00 

4.03 

4.77 

4.92 

5.00 
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Table 9. Average ranks for the mean sqnare errors of the average dissimilarity matrix 
-GLasso estimates. 




Hubs- 

•based 



Power 

law 


n 

50 

100 

200 

500 

50 

100 

200 

500 

AGNES 

2.17 

2.75 

dimension p=50 
2.34 2.19 

2.43 

2.67 

2.38 

2.11 

A-MSE 

2.75 

2.68 

1.95 

1.69 

2.52 

1.78 

1.63 

1.52 

PC 

4.05 

4.00 

2.65 

2.55 

2.58 

2.25 

2.32 

2.48 

StARS 

4.12 

4.46 

5.81 

5.72 

5.68 

5.53 

5.97 

5.96 

BIG 

4.28 

2.68 

3.12 

3.58 

3.26 

3.42 

3.67 

3.89 

AIC 

3.62 

4.42 

5.12 

5.28 

4.53 

5.35 

5.03 

5.04 

AGNES 

1.61 

2.43 

dimension p=170 
2.15 2.03 

2.59 

2.80 

2.84 

2.53 

A-MSE 

2.73 

1.94 

1.78 

1.25 

2.76 

2.22 

1.72 

1.38 

PC 

3.98 

3.16 

3.06 

3.59 

3.01 

2.43 

1.74 

2.10 

StARS 

6.00 

5.78 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

BIG 

4.29 

2.48 

3.02 

3.48 

3.45 

2.55 

3.70 

3.98 

AIC 

2.39 

5.20 

4.98 

4.65 

3.19 

5.00 

5.00 

5.00 

AGNES 

1.57 

2.11 

dimension p=290 
2.27 2.33 

2.47 

2.85 

3.07 

2.78 

A-MSE 

3.00 

2.48 

1.98 

1.55 

3.15 

2.46 

1.48 

1.57 

PC 

4.06 

3.64 

2.85 

2.67 

3.12 

2.50 

1.72 

1.66 

StARS 

6.00 

5.99 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

BIG 

4.64 

2.32 

2.93 

3.64 

3.85 

2.23 

3.74 

3.98 

AIC 

1.73 

4.46 

4.97 

4.80 

2.42 

4.97 

5.00 

5.00 

AGNES 

1.22 

1.75 

dimension p=500 
2.10 1.68 

1.82 

3.28 

3.23 

2.89 

A-MSE 

3.26 

2.93 

2.09 

1.67 

3.46 

2.27 

1.67 

1.53 

PC 

3.92 

4.28 

3.32 

3.15 

3.33 

2.62 

1.62 

1.61 

StARS 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

BIG 

4.78 

2.52 

2.55 

3.39 

4.30 

1.92 

3.49 

3.97 

AIC 

1.82 

3.52 

4.94 

5.00 

2.09 

4.92 

5.00 

5.00 


Table 10. Average ranks for the absolnte error on the degree distribntion -GLasso 
estimates. 




Hubs- 

•based 



Power 

law 


n 

50 

100 

200 

500 

50 

100 

200 

500 

AGNES 

3.00 

3.09 

dimension p=50 
2.50 2.54 

3.69 

2.98 

2.62 

2.19 

A-MSE 

2.02 

1.77 

1.51 

1.55 

2.22 

1.56 

1.36 

1.27 

PC 

2.51 

1.95 

2.06 

2.05 

2.08 

2.23 

2.16 

2.54 

StARS 

5.72 

5.52 

5.78 

5.72 

5.88 

5.57 

5.97 

5.89 

BIG 

2.48 

3.20 

3.93 

3.87 

2.10 

3.24 

3.86 

3.99 

AIC 

5.28 

5.47 

5.22 

5.27 

5.04 

5.42 

5.03 

5.11 

AGNES 

3.08 

3.38 

dimension p=170 
2.92 2.58 

4.23 

3.81 

3.03 

2.74 

A-MSE 

1.99 

1.76 

1.67 

1.59 

2.11 

1.91 

1.67 

1.44 

PC 

2.73 

2.20 

1.53 

2.27 

1.82 

1.61 

1.42 

1.82 

StARS 

6.00 

5.86 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

BIG 

2.94 

2.67 

3.90 

3.72 

2.20 

2.67 

3.88 

4.00 

AIC 

4.25 

5.14 

4.98 

4.83 

4.63 

5.00 

5.00 

5.00 

AGNES 

3.00 

3.55 

dimension p=290 
2.73 2.52 

4.48 

3.83 

3.11 

2.83 

A-MSE 

2.09 

1.83 

1.67 

1.53 

2.11 

1.77 

1.62 

1.44 

PC 

3.15 

2.26 

1.73 

2.40 

2.13 

1.59 

1.41 

1.75 

StARS 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

BIG 

3.52 

2.36 

3.88 

3.68 

2.28 

2.80 

3.87 

3.98 

AIC 

3.23 

5.00 

5.00 

4.87 

3.99 

5.00 

5.00 

5.00 

AGNES 

2.65 

3.00 

dimension p=500 
2.84 2.54 

4.72 

3.95 

3.31 

2.85 

A-MSE 

2.80 

1.88 

1.72 

1.55 

2.59 

1.88 

1.62 

1.57 

PC 

3.27 

2.25 

1.78 

2.62 

1.94 

1.64 

1.41 

1.60 

StARS 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

6.00 

BIG 

4.19 

2.00 

3.90 

3.55 

2.71 

2.53 

3.66 

3.98 

AIC 

2.08 

4.64 

5.00 

4.75 

3.04 

5.00 

5.00 

5.00 
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Table 11. Average ranks for the mean sqnare errors of the harmonic mean -MB 
estimates. 




Hubs- 

•based 


Power 

law 


n 

50 

100 

200 500 

50 

100 

200 

500 

AGNES 

1.86 

1.99 

dimension p=50 
2.43 2.67 

1.95 

1.81 

1.95 

1.86 

A-MSE 

1.87 

1.48 

1.32 1.37 

2.26 

2.31 

1.77 

1.66 

PC 

3.12 

3.23 

2.50 2.07 

2.29 

2.25 

2.47 

2.53 

StARS 

3.15 

3.30 

3.75 3.90 

3.50 

3.63 

3.82 

3.95 

AGNES 

1.53 

2.18 

dimension p=170 
2.40 2.47 

1.39 

1.60 

1.82 

1.61 

A-MSE 

2.31 

1.43 

1.33 1.27 

3.08 

2.65 

2.54 

2.26 

PC 

3.42 

2.98 

2.62 2.58 

2.88 

2.83 

2.20 

2.32 

StARS 

2.73 

3.40 

3.65 3.68 

2.65 

2.92 

3.43 

3.82 

AGNES 

1.20 

1.98 

dimension p=290 
2.20 2.53 

1.20 

1.40 

1.41 

1.55 

A-MSE 

3.12 

1.53 

1.41 1.24 

3.52 

3.23 

2.91 

2.47 

PC 

3.35 

3.37 

2.92 2.51 

3.02 

2.73 

2.65 

2.35 

StARS 

2.33 

3.12 

3.47 3.72 

2.27 

2.65 

3.03 

3.63 

AGNES 

1.07 

1.65 

dimension p=500 
2.07 2.43 

1.05 

1.25 

1.72 

1.43 

A-MSE 

3.53 

1.96 

1.32 1.32 

3.74 

3.38 

2.77 

2.78 

PC 

3.30 

3.49 

3.34 2.62 

3.14 

3.08 

2.62 

2.54 

StARS 

2.10 

2.90 

3.27 3.62 

2.07 

2.28 

2.88 

3.25 


Table 12. Average ranks for the mean sqnare errors of the AGNES coefficient -MB 
estimates. 

Hubs-based Power law 


n 

50 

100 

200 

500 


50 

100 

200 

500 




dimension p; 

-50 





AGNES 

2.46 

2.77 

2.70 

2.82 


2.00 

2.33 

2.58 

2.20 

A-MSE 

1.90 

1.72 

1.32 

1.48 


2.26 

1.90 

1.61 

1.48 

PC 

1.64 

1.51 

2.00 

1.90 


1.76 

1.77 

1.87 

2.38 

StARS 

4.00 

4.00 

3.98 

3.80 


3.98 

4.00 

3.95 

3.93 




dimension p= 

M70 





AGNES 

2.67 

3.00 

3.22 

3.12 


2.39 

2.62 

2.66 

2.49 

A-MSE 

1.71 

1.83 

1.83 

1.71 


1.87 

1.78 

1.68 

1.66 

PC 

1.62 

1.17 

1.25 

1.64 


1.74 

1.60 

1.88 

2.18 

StARS 

4.00 

4.00 

3.70 

3.53 


4.00 

4.00 

3.78 

3.67 




dimension p= 

-290 





AGNES 

2.53 

2.97 

3.13 

3.47 


2.45 

2.85 

3.01 

2.93 

A-MSE 

1.80 

1.73 

1.77 

1.73 


1.95 

1.59 

1.61 

1.53 

PC 

1.67 

1.30 

1.29 

1.59 


1.60 

1.56 

1.80 

2.12 

StARS 

4.00 

4.00 

3.80 

3.22 


4.00 

4.00 

3.58 

3.42 




dimension p= 

-500 





AGNES 

2.57 

2.97 

3.20 

3.63 


2.32 

2.92 

3.17 

3.01 

A-MSE 

1.73 

1.64 

1.81 

1.73 


2.11 

1.58 

1.56 

1.77 

PC 

1.70 

1.39 

1.19 

1.74 


1.57 

1.50 

1.57 

2.11 

StARS 

4.00 

4.00 

3.80 

2.90 


4.00 

4.00 

3.70 

3.12 


Table 13. Average ranks for the mean sqnare errors of the Estrada index -MB 
estimates. 




Hubs- 

•based 


Power 

law 


n 

50 

100 

200 500 

50 

100 

200 

500 

AGNES 

2.29 

2.56 

dimension p=50 
2.75 2.82 

2.48 

2.44 

2.32 

1.92 

A-MSE 

1.67 

1.62 

1.33 1.47 

1.79 

1.66 

1.50 

1.43 

PC 

2.04 

1.82 

1.92 1.72 

1.73 

1.90 

2.18 

2.65 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 

AGNES 

1.57 

2.67 

dimension p=170 
2.93 2.85 

2.08 

2.32 

2.33 

2.11 

A-MSE 

1.98 

1.45 

1.53 1.59 

2.02 

1.83 

1.71 

1.62 

PC 

2.46 

1.88 

1.53 1.56 

1.91 

1.85 

1.97 

2.27 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 

AGNES 

1.33 

2.20 

dimension p=290 
2.70 2.92 

1.52 

1.98 

2.34 

2.13 

A-MSE 

2.23 

1.62 

1.39 1.54 

2.48 

2.17 

1.86 

1.77 

PC 

2.43 

2.18 

1.91 1.54 

2.00 

1.84 

1.80 

2.10 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 

AGNES 

1.00 

1.48 

dimension p=500 
2.62 2.90 

1.22 

1.83 

2.32 

2.02 

A-MSE 

2.55 

1.84 

1.31 1.49 

2.66 

2.22 

1.88 

1.95 

PC 

2.45 

2.67 

2.08 1.61 

2.12 

1.95 

1.81 

2.02 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 
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Table 14. Average ranks for the mean sqnare errors of the dissimilarity matrix -MB 
estimates. 


Hubs-based Power law 


n 

50 

100 

200 500 

50 

100 

200 

500 

AGNES 

2.21 

2.24 

dimension p=50 
2.52 2.67 

2.53 

2.53 

2.36 

1.97 

A-MSE 

1.82 

1.43 

1.30 1.42 

1.57 

1.33 

1.54 

1.33 

PC 

2.49 

2.59 

2.18 1.92 

1.94 

2.13 

2.10 

2.70 

StARS 

3.48 

3.73 

4.00 4.00 

3.95 

4.00 

4.00 

4.00 

AGNES 

2.87 

2.58 

dimension p=170 
2.65 2.62 

2.94 

2.97 

2.54 

2.12 

A-MSE 

1.51 

1.42 

1.40 1.29 

1.40 

1.47 

1.54 

1.41 

PC 

1.62 

2.00 

1.95 2.09 

1.66 

1.57 

1.92 

2.47 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 

AGNES 

2.88 

2.55 

dimension p=290 
2.60 2.72 

3.0 

3.00 

2.84 

2.55 

A-MSE 

1.52 

1.40 

1.26 1.31 

1.3 

1.39 

1.43 

1.47 

PC 

1.60 

2.05 

2.14 1.98 

1.7 

1.61 

1.73 

1.98 

StARS 

4.00 

4.00 

4.00 4.00 

4.0 

4.00 

4.00 

4.00 

AGNES 

2.90 

2.40 

dimension p=500 
2.45 2.67 

3.00 

3.00 

3.00 

2.34 

A-MSE 

1.47 

1.27 

1.18 1.34 

1.31 

1.38 

1.39 

1.62 

PC 

1.63 

2.33 

2.38 1.99 

1.69 

1.62 

1.61 

2.04 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 


Table 15. Average ranks for the mean sqnare errors of the average dissimilarity matrix 
-MB estimates 




Hubs- 

•based 


Power 

law 


n 

50 

100 

200 500 

50 

100 

200 

500 

AGNES 

1.69 

1.92 

dimension p=50 
2.1 2.27 

1.85 

1.97 

1.98 

1.95 

A-MSE 

2.63 

2.20 

1.6 1.57 

2.04 

2.00 

1.71 

1.62 

PC 

3.17 

3.09 

2.3 2.17 

2.38 

2.10 

2.32 

2.43 

StARS 

2.50 

2.78 

4.0 4.00 

3.73 

3.93 

4.00 

4.00 

AGNES 

1.00 

1.03 

dimension p=170 
1.52 2.07 

1.16 

1.23 

1.61 

1.82 

A-MSE 

2.29 

2.10 

1.87 1.69 

2.53 

2.38 

2.38 

1.96 

PC 

2.71 

2.87 

2.62 2.24 

2.31 

2.38 

2.02 

2.22 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 

AGNES 

1.00 

1.02 

dimension p=290 
1.43 2.37 

1.02 

1.10 

1.26 

1.72 

A-MSE 

2.42 

2.10 

1.94 1.54 

2.73 

2.64 

2.42 

2.20 

PC 

2.58 

2.88 

2.62 2.09 

2.25 

2.26 

2.32 

2.08 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 

AGNES 

1.00 

1.00 

dimension p=500 
1.20 2.43 

1.00 

1.02 

1.03 

1.34 

A-MSE 

2.55 

2.09 

1.94 1.59 

2.77 

2.55 

2.56 

2.37 

PC 

2.45 

2.91 

2.86 1.98 

2.23 

2.43 

2.41 

2.29 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 


Table 16. Average ranks for the mean sqnare errors of the degree distribntion -MB 
estimates. 




Hubs- 

•based 


Power 

law 


n 

50 

100 

200 500 

50 

100 

200 

500 

AGNES 

2.39 

2.45 

dimension p=50 
2.61 2.8 

2.30 

2.18 

2.19 

1.89 

A-MSE 

1.56 

1.51 

1.28 1.4 

1.90 

1.93 

1.59 

1.38 

PC 

2.05 

2.04 

2.11 1.8 

1.80 

1.88 

2.22 

2.73 

StARS 

4.00 

4.00 

4.00 4.0 

4.00 

4.00 

4.00 

4.00 

AGNES 

1.94 

2.72 

dimension p=170 
2.88 2.84 

1.76 

1.82 

1.92 

1.68 

A-MSE 

1.71 

1.35 

1.41 1.38 

2.26 

2.10 

2.15 

2.07 

PC 

2.35 

1.93 

1.71 1.78 

1.98 

2.08 

1.93 

2.26 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 

AGNES 

1.44 

2.5 

dimension p=290 
2.85 2.87 

1.32 

1.51 

1.52 

1.64 

A-MSE 

2.17 

1.3 

1.23 1.37 

2.58 

2.48 

2.38 

2.33 

PC 

2.39 

2.2 

1.92 1.77 

2.10 

2.02 

2.10 

2.03 

StARS 

4.00 

4.0 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 

AGNES 

1.18 

2.02 

dimension p=500 
2.73 2.87 

1.12 

1.31 

1.68 

1.35 

A-MSE 

2.44 

1.48 

1.07 1.31 

2.71 

2.45 

2.23 

2.53 

PC 

2.38 

2.51 

2.20 1.82 

2.17 

2.24 

2.09 

2.12 

StARS 

4.00 

4.00 

4.00 4.00 

4.00 

4.00 

4.00 

4.00 
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7.5 A-MSE and oracle tuning parameter 

In Figure 5 we present a comparison between the A selection by A-MSE and the 
that minimizes 

RmseW = (13) 

i>j 

We use 60 simulated data sets for each n/p combination with the true models being a 
power-law distribution as dehned in Section 7.1, Similar results are found when using 
hubs-based models. A-MSE selections are reasonably close to the best A in almost all 
presented scenarios, and in all cases, the oracle value of A is within the 95% conhdence 
interval for the median of Xamse- 



(c) p=290 (d) p=500 

Figure 5. A-MSE A selections against the oracle best A for dissimilarities mean square 
errors. 
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7.6 AGNES by variable subset approach 

Here, we test two different ways of computing the variable subsets to approximate the 
AGNES coefficient in high-dimensional data: totally random subsets and the strategy 
proposed in Section 3.3 in the article (which we denote it by intelligent subsets). In 
Figure 6 we show the A errors (a, b, c) and the relative time reduction (d, e, f) for 
the subset approaches with respect to the procedure using all the variables. We use 50 
replicates for three different power-law true models with large p (= 700,1400,1645) and 
sample size n = 100. For large dimension size (i.e. p > 1000), the subset approach 
reduces considerably the computational time while maintaining a similar A estimation as 
the one given using the whole data. 


^ int.subset 
$ int.subset 



method 


(a) A errors (p=700) 



method 


(d) Relative time reduction 
(p=700) 



method 


(b) A errors (p=1400) 


0.70- 



method 


(e) Relative time reduction 
(p=1400) 


method 

^irt.subs 



method 


(c) A errors (p=1645) 


0.75 



method 


(f) Relative time reduction 
(p=1645) 


Figure 6. Estimation of the best graph by subset selection in the AGNES approach: 
Errors in the A selection with respect to AGNES with all variables and relative time 
reduction with respect to the AGNES procedure. 
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